Discover the power of GPU computing to accelerate your research on UCLA’s Hoffman2 cluster! This workshop is designed to guide you through the essentials of GPU utilization, enhancing your projects with cutting-edge computational efficiency. ⭐
🔑 Key Topics:
For suggestions: cpeterson@oarc.ucla.edu
This presentation and accompanying materials are available on 🔗 UCLA OARC GitHub Repository
You can view the slides in:
Each file provides detailed instructions and examples on the various topics covered in this workshop.
Note: 🛠️ This presentation was built using Quarto and RStudio.
Graphic Processing Units (GPUs) were initially developed for processing graphics and visual operations, as CPUs were too slow for these tasks. The architecture of GPUs allows them to handle massively parallel tasks efficiently.
In the mid-2000s, GPUs began to be used for non-graphical computations. NVIDIA introduced CUDA, a programming language that allows for compiling non-graphic programs on GPUs, spearheading the era of General-Purpose GPU (GPGPU).
These are founds in everything! For example, PCs, mobile phones, Xbox, Playstations
GPUs are ubiquitous and found in devices ranging from PCs to mobile phones, and gaming consoles like Xbox and PlayStation.
Though initially designed for graphics, GPUs are now used in a wide range of applications.
The significant speedup offered by GPUs comes from their ability to parallelize operations over thousands of cores, unlike traditional CPUs.
picture source NVIDIA
picture source NVIDIA
There are multiple GPU types available in the cluster.
Each GPU has a different compute capability, memory size and clock speed.
| GPU type | # CUDA cores | VMem | SGE option |
|---|---|---|---|
| NVIDIA A100 | 6912 | 80 GB | -l gpu,A100,cuda=1 |
| Tesla V100 | 5120 | 32 GB | -l gpu,V100,cuda=1 |
| RTX 2080 Ti | 4352 | 10 GB | -l gpu,RTX2080Ti,cuda=1 |
| Tesla P4 | 2560 | 8 GB | -l gpu,P4,cuda=1 |
Warning
When you using the -l gpu option, this only reserves the GPU for your job.
You will still need to use GPU optimized software and libraries to take advantage of the GPU’s parallel processing power.
The following sections will cover how to compile and run GPU optimized code on Hoffman2.
CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) from NVIDIA. It allows developers to write programs that execute on GPUs.
On Hoffman2, you can compile CUDA code by loading the cuda module. This will modify the environment to from the CUDA toolkit. This Toolkit provides the necessary libraries and compilers to compile and run CUDA code.
picture source NVIDIA
We will so a simple example of a CUDA code that does a Matrix multiplication (1024x1024).
MatrixMult folder
Matrix-cpu.cpp file contains CPU (serial) codeMatrix-gpu.cu file contains the CUDA codeMatrixMult.job job submission fileBe on the lookout for GPU optimized software for your research!
Other GPU platforms include:
There are several Python and R packages that use GPUs for varsious data-intensives tasks, like Machine Learning, Deep Learning, and large-scale data processing.
Python:
R:
Installing TensorFlow and PyTorch on Hoffman2 is straightforward using the Anaconda package manager. (Check out my Workshop on using Anaconda)
Create a new conda environmnet with CUDA tools.
Install TensorFlow/PyTorch with GPU support and the NVIDIA libraries
Verify the TensorFlow installation. Will only work if you are on a GPU-enabled node.
# TensofFlow Test:
python -c "import tensorflow as tf; print('TensorFlow is using:', ('GPU: ' + tf.test.gpu_device_name()) if tf.test.is_gpu_available() else 'CPU')"
# PyTorch Test:
python -c "import torch; print('PyTorch is using:', ('GPU: ' + torch.cuda.get_device_name(0)) if torch.cuda.is_available() else 'CPU')"This example focuses on the “Fashion MNIST” dataset, a collection used frequently in machine learning for image recognition tasks.
Approach:
Dataset Overview:
Now that we have TensorFlow installed, we can run some examples to test the GPU acceleration.
Files in the TF-Torch folder contain examples of using TensorFlow on Hoffman2.
🧬 DNA Sequence Classification with PyTorch
Processing large genomic datasets such as VCF files can be computationally intensive and time-consuming. Leveraging GPU acceleration can significantly reduce processing times, allowing for more rapid data analysis and insights.
We will
Rapids is a suite of open-source software libraries and APIs built on CUDA to enable execution of end-to-end data science and analytics pipelines on GPUs. cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
In this example, we will use cuDF to load and filter genomic data efficiently using GPU accelaration
Files in the rapids folder
rapids_analysis-gpu.py - GPU versionrapids_analysis-cpu.py - CPU versionThe rapid_analysis.job will submit the job to the Hoffman2 cluster.
In this file, the line #$ -l gpu,V100 will submit this job to the V100 GPU nodes.
We will use R and install the H2O.ai package to run the example.
In the h2oai folder, the h2oaiXGBoost.R script the code to run XGBoost on the Combined Cycle Power Plant dataset.
The h2oML-gpu.job file will submit the job to the Hoffman2 cluster on a GPU node.
The h2oML-cpu.job file will submit the job to the Hoffman2 cluster on a CPU node.
The h2o.ai functions will automatically detect the GPU and use it for training.
Hoffman2 has the resources and tools to help you leverage the power of GPUs for your research.
Main Takeaways:
-l gpu option to reserve a GPU node